Networks are manifested in a variety of fields such as social media, global value chains, disease outbreaks, and internet browsing. Network analysis may allow us to:
NetworkX requires Python 3.7 or newer.
To install the latest release of the package, run pip install networkx[default].
To install the package without the dependencies (e.g., numpy, scipy), run pip install networkx.
Alternatively, manual downloads are also possible through Network's GitHub or PyPI repositories.
Graph creation is fundamentally comprised of 5 steps:
import networkx as nxg = nx.Graph()g.add_node(node) g.add_edge(node_1, node_2)nx.draw(g)Note: Step 3 may be skipped as nodes will automatically be created when edges are created between non-existent nodes.
# Create a networkx graph object
my_graph = nx.Graph()
# Add edges to to the graph object
# Each tuple represents an edge between two nodes
my_graph.add_edges_from([
(1,2),
(1,3),
(3,4),
(1,5),
(3,5),
(4,2),
(2,3),
(3,0)])
# Draw the resulting graph
nx.draw(my_graph, with_labels=True, font_weight='bold')
from_pandas_edgelist)¶# create a dataframe
df = pd.DataFrame({'from': ['A', 'B', 'C', 'A'],
'to': ['D', 'A', 'E', 'C']})
# create graph object
G = nx.from_pandas_edgelist(df, 'from', 'to')
# plot the network graph
nx.draw(G, with_labels=False, node_size=500, alpha=1, linewidths=20)
pos) and graph elements (matplotlib integration)¶# create nodes and edges
G = nx.Graph()
G.add_edge(1, 2)
G.add_edge(1, 3)
G.add_edge(1, 5)
G.add_edge(2, 3)
G.add_edge(3, 4)
G.add_edge(4, 5)
# set positions
pos = {1: (0, 0), 2: (-1, 0.3), 3: (2, 0.17), 4: (4, 0.255), 5: (5, 0.03)}
options = {
"font_size": 36,
"node_size": 3000,
"node_color": "white",
"edgecolors": "black",
"linewidths": 5,
"width": 5}
nx.draw_networkx(G, pos, **options)
# Set margins for the axes so that nodes aren't clipped
ax = plt.gca()
ax.margins(0.20)
plt.axis("off")
plt.show()
DiGraph)¶The source node must be specified before the target node
# create graph object
G = nx.DiGraph([(0, 3), (1, 3), (2, 4), (3, 5), (3, 6), (4, 6), (5, 6)])
# group nodes by column
left_nodes = [0, 1, 2]
middle_nodes = [3, 4]
right_nodes = [5, 6]
# set the position according to column (x-coord)
pos = {n: (0, i) for i, n in enumerate(left_nodes)}
pos.update({n: (1, i + 0.5) for i, n in enumerate(middle_nodes)})
pos.update({n: (2, i + 0.5) for i, n in enumerate(right_nodes)})
nx.draw_networkx(G, pos, **options)
# Set margins for the axes so that nodes aren't clipped
ax = plt.gca()
ax.margins(0.20)
plt.axis("off")
plt.show()
Marvel Cinematic Universe Social Network data from Tableau
# import data
msn = pd.read_csv("https://github.com/mkbunyi/Data-Viz-Tutorial-NetworkX/raw/main/marvel_social_network.csv")
msn.head()
| Character Name | Line ID | Path | Ch Name Diff | Character ID | Character Name (copy) | Main Character | Name in Caps | Relation | Relation - Negative | ... | Relation Sentiment | Relationship | Set ID | Synergy Type | URL | 1 | Number of Records | S.No. | X | Y | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Abomination | 1 | 1 | NaN | 1 | Abomination | Abomination | ABOMINATION | Friends | NaN | ... | Positive | NaN | 1 | Reciprocal | NaN | 1 | 1 | 1 | 6581.620117 | 2663.285156 |
| 1 | Rhino | 1 | 2 | Rhino | 83 | Rhino | Abomination | RHINO | Friends | NaN | ... | Positive | Relationship with Abomination - | 1 | Reciprocal | NaN | 1 | 1 | 2 | 7942.243652 | 9485.032227 |
| 2 | Abomination | 2 | 1 | NaN | 1 | Abomination | Abomination | ABOMINATION | Nemesis | NaN | ... | Negative | NaN | 1 | Perfect | NaN | 1 | 1 | 3 | 6581.620117 | 2663.285156 |
| 3 | Hulk | 2 | 2 | Hulk | 45 | Hulk | Abomination | HULK | Nemesis | Nemesis | ... | Negative | Relationship with Abomination - | 1 | Perfect | NaN | 1 | 1 | 4 | 4651.433594 | 5910.624023 |
| 4 | Abomination | 3 | 1 | NaN | 1 | Abomination | Abomination | ABOMINATION | Enemies | NaN | ... | Negative | NaN | 1 | Normal | NaN | 1 | 1 | 5 | 6581.620117 | 2663.285156 |
5 rows × 25 columns
# reformat to combine linked characters in 1 row
# collapse rows by line ID and combine linked characters in a list per cell
msn_nx = msn.groupby('Line ID').agg(lambda x: x.tolist())
msn_nx = msn_nx[["Character Name","Character ID"]]
# split list and allocate separate columns for the linked characters
msn_nx = pd.concat([msn_nx["Character Name"].apply(pd.Series),
msn_nx["Character ID"].apply(pd.Series)],
axis=1).reset_index()
# add relation type
msn_nx = msn_nx.merge(msn[["Line ID","Relation","Relation Sentiment"]],
on="Line ID", how = "left").drop_duplicates()
# rename columns
msn_nx.columns = ['Line ID', 'Char1_Name', 'Char2_Name', 'Char1_ID', 'Char2_ID', 'Relation', 'Relation Sentiment']
# reset index
msn_nx = msn_nx.reset_index()
# set color according to relation sentiment
msn_nx['color'] = np.where(msn_nx['Relation Sentiment']=="Positive",
"green",
"black")
msn_nx['color'] = np.where(msn_nx['Relation Sentiment']=="Negative",
"red",
msn_nx['color'])
# view reformatted data
msn_nx.head()
| index | Line ID | Char1_Name | Char2_Name | Char1_ID | Char2_ID | Relation | Relation Sentiment | color | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 1 | Abomination | Rhino | 1 | 83 | Friends | Positive | green |
| 1 | 2 | 2 | Abomination | Hulk | 1 | 45 | Nemesis | Negative | red |
| 2 | 4 | 3 | Abomination | She-Hulk | 1 | 90 | Enemies | Negative | red |
| 3 | 6 | 4 | Abomination | Red Hulk | 1 | 82 | Enemies | Negative | red |
| 4 | 8 | 5 | Abomination | King Groot | 1 | 59 | Friends | Positive | green |
# Initialize a graph object
G = nx.from_pandas_edgelist(msn_nx,
'Char1_Name',
'Char2_Name',
edge_attr=["Relation","Relation Sentiment"])
# Generate layout for visualization
pos = nx.kamada_kawai_layout(G)
# Manual position tweaking
pos["Captain America"] += (0, -1)
We will set node size proportional to the number of links.
# node size is proportional to number of links
links=dict.fromkeys(G.nodes(),0.0)
for (node1,node2,attrib) in G.edges(data=True):
links[node1]+=1
links[node2]+=1
fig, ax = plt.subplots(figsize=(40, 40))
# draw edges
nx.draw_networkx_edges(G, pos, alpha=1, width=5,
edge_color=[msn_nx["color"][i] for i in list(range(len(msn_nx)))])
# draw nodes
nx.draw_networkx_nodes(G, pos,
node_size = [links[i]*500 for i in G],
node_color="blue", alpha=1,
label=[msn_nx["Char1_Name"][i] for i in list(range(len(msn_nx)))])
# draw labels
label_options = {"ec": "black", "fc": "white", "alpha": .9}
nx.draw_networkx_labels(G, pos, font_size=30, bbox=label_options)
# display title
font = {"color": "black", "fontweight": "bold", "fontsize": 40}
ax.set_title("Marvel Social Network", font)
# Resize figure for label readibility
ax.margins(0.1, 0.05)
fig.tight_layout()
plt.axis("off")
plt.show()
# initialize
top_links = {}
# iterate through nodes to count connections
for char in G.nodes:
top_links[char] = len(G[char])
# convert to dataframe
s = pd.Series(top_links, name='connections')
df = s.to_frame().sort_values('connections', ascending=False)
df
| connections | |
|---|---|
| Black Widow | 20 |
| Hulk | 20 |
| Wolverine | 19 |
| Spider-Man (Classic) | 18 |
| Iron Man | 18 |
| ... | ... |
| King Groot | 4 |
| Blade | 4 |
| Killmonger | 4 |
| Punisher 2099 | 3 |
| Psylocke | 3 |
118 rows × 1 columns
msn_blackwidow = msn_nx[msn_nx["Char1_Name"]=="Black Widow"].reset_index()
msn_blackwidow.head
| level_0 | index | Line ID | Char1_Name | Char2_Name | Char1_ID | Char2_ID | Relation | Relation Sentiment | color | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 80 | 160 | 81 | Black Widow | Archangel | 10 | 5 | Teammates | Positive | green |
| 1 | 81 | 162 | 82 | Black Widow | Black Panther (Civil War) | 10 | 9 | Friends | Positive | green |
| 2 | 82 | 164 | 83 | Black Widow | Captain Marvel | 10 | 15 | Friends | Positive | green |
| 3 | 83 | 166 | 84 | Black Widow | Crossbones | 10 | 19 | Rivals | Negative | red |
| 4 | 84 | 168 | 85 | Black Widow | Daredevil (Classic) | 10 | 23 | Romance | Positive | green |
| 5 | 85 | 170 | 86 | Black Widow | Elektra | 10 | 32 | Rivals | Negative | red |
| 6 | 86 | 172 | 87 | Black Widow | Falcon | 10 | 33 | Enemies | Negative | red |
| 7 | 87 | 174 | 88 | Black Widow | Hawkeye | 10 | 41 | Romance | Positive | green |
| 8 | 88 | 176 | 89 | Black Widow | Hulk | 10 | 45 | Avengers | Positive | green |
| 9 | 89 | 178 | 90 | Black Widow | Hulk (Ragnarok) | 10 | 46 | Lullaby | Positive | green |
| 10 | 90 | 180 | 91 | Black Widow | Hulkbuster | 10 | 47 | Avengers | Positive | green |
| 11 | 91 | 182 | 92 | Black Widow | Iceman | 10 | 49 | Teammates | Positive | green |
| 12 | 92 | 184 | 93 | Black Widow | Ms. Marvel | 10 | 72 | Friends | Positive | green |
| 13 | 93 | 186 | 94 | Black Widow | Quake | 10 | 81 | S.H.I.E.L.D Clearance | Neutral | black |
| 14 | 94 | 188 | 95 | Black Widow | Sentry | 10 | 89 | Friends | Positive | green |
| 15 | 95 | 190 | 96 | Black Widow | Thor (Jane Foster) | 10 | 102 | Friends | Positive | green |
| 16 | 96 | 192 | 97 | Black Widow | Ultron | 10 | 104 | Enemies | Negative | red |
| 17 | 97 | 194 | 98 | Black Widow | Void | 10 | 111 | Overcoming Fear | Neutral | black |
| 18 | 98 | 196 | 99 | Black Widow | War Machine | 10 | 113 | Teammates | Positive | green |
| 19 | 99 | 198 | 100 | Black Widow | Winter Soldier | 10 | 114 | Romance | Positive | green |
We will use the spring layout, which will put Black Widow in the middle of the graph.
# initialize plot
fig, ax = plt.subplots(figsize=(10, 8))
# Initialize a graph object
G = nx.from_pandas_edgelist(msn_blackwidow,
'Char1_Name',
'Char2_Name',
edge_attr=["Relation","Relation Sentiment"])
# Draw using a spring layout
nx.draw_spring(G,with_labels=True,
edge_color=[msn_blackwidow["color"][i] for i in list(range(len(msn_blackwidow)))],
node_color = "gainsboro",
node_size = 2000,
font_size=11)
# Resize figure for label readibility
fig.tight_layout()
plt.axis("off")
plt.margins(x=0.4)
plt.show()
Connected components¶# load in data
cities = pd.read_csv("https://github.com/mkbunyi/Data-Viz-Tutorial-NetworkX/raw/main/distances.csv")
cities.head()
| node1 | node2 | distance | |
|---|---|---|---|
| 0 | Mannheim | Frankfurt | 85 |
| 1 | Mannheim | Karlsruhe | 80 |
| 2 | Erfurt | Wurzburg | 186 |
| 3 | Munchen | Numberg | 167 |
| 4 | Munchen | Augsburg | 84 |
# create graph object along with nodes and edges
g = nx.Graph()
for edge in range(len(cities)):
g.add_edge(cities["node1"][edge],
cities["node2"][edge],
weight = cities["distance"][edge])
connected_components function to identify distinct sub-groups¶for i, x in enumerate(nx.connected_components(g)):
print("cc"+str(i)+":",x)
cc0: {'Erfurt', 'Wurzburg', 'Mannheim', 'Frankfurt', 'Karlsruhe', 'Stuttgart', 'Augsburg', 'Numberg', 'Munchen', 'Kassel'}
cc1: {'Kolkata', 'Bangalore', 'Delhi', 'Mumbai'}
cc2: {'ALB', 'NY', 'TX'}
'spring layout': set k as the optimal distance between nodes. The higher the value for k, the larger the distance between nodes.
# set layout
pos = nx.spring_layout(g, k=5, seed=10)
# plot the network
nx.draw(g,pos,
with_labels = True, #labels nodes
node_color='lightsteelblue',
edge_color='slategrey')
# label edges (distance between cities)
edge_labels = nx.get_edge_attributes(g,'weight')
nx.draw_networkx_edge_labels(g,pos,edge_labels=edge_labels,
font_size=9,rotate=False)
plt.show()
print(nx.shortest_path_length(g, 'Frankfurt','Stuttgart',weight='weight'))
print(nx.shortest_path(g, 'Frankfurt','Stuttgart',weight='weight'))
503 ['Frankfurt', 'Wurzburg', 'Numberg', 'Stuttgart']
This algorithm measures node importance based on the number and quality of its (incoming and outgoing) links. Its use cases include:
NetworkX covers various centrality measures, such as: